In this minilab, you will assess how the assumptions in the analysis of behavioral data can affect the outcomes. Specifically, you will explore and question a common assumption made when interpreting the information encoded in neural firing rates.
In the data wrangling tutorial, we saw that the neural responses of a single neuron can contain all the information necessary to predict an animal’s choice (recall the near-identity of the ideal observer’s predictions based on the ROC analysis with the monkey’s choices). From Zaidel, DeAngelis, & Angelaki (2017, p. 2):
The observation that activity of single sensory neurons in the brain can predict perceptual decisions, even before they are reported by the subject, has generated substantial and continued interest. This result has been corroborated by widespread findings of significant choice probabilities (CPs; a metric that quantifies the relationship between neuronal activity and perceptual decisions across repeated presentation of a stimulus).
However, as Zaidel et al further note (p. 1-2), the interpretation of CPs is complex: high CPs could be driven by bottom-up processing (i.e., information about the stimulus being fed forward to the neuron) or by top-down feedback from decision areas. Neural responses might correlate with choice simply because neural responses correlate with the stimulus and choice also correlates with the stimulus. Zaidel et al use linear regression to tease apare influences of stimulus and choice on neural spike rates, and compare this approach to CP. They find CPs are a problematic measure.
In this minilab, you will extend their analysis to new data, and you will go one step further to assess the assumptions made in the linear regression approach in Zaidel et al. For this purpose you will analyze neurons in middle temporal (MT) code, using the same data set from Uka and DeAngelis (2003) that you’ve worked with in the data wrangling and visualization tutorials.
Please answer the following questions in an R markdown report submitted each weak of the minilab. Describe your analysis plan. Try to get as far as you can. If you get stuck in R, try to describe in your own words what you’re trying to achieve. You can also add hand-drown figures to the report, as long as you import them (e.g., take a photo of a plot with your phone and add it to the report.)
You can ask other class members and the instructor for help at any point. Please us the slack channel #general for that. This ensures that everyone benefits from whatever answers are provided, and that everyone has access to the same resources. There is not limit to the collaboration you can engage in, as long as it is through the slack channel.
Be prepared to present your solutions during the next class.
Visualize the relation between Stimulus (x-axis), Choice (color), and spike rates (Spikes, y-axes). Do so for at least 20 cells. Is the effect of Stimulus linear? Pointers:
library("tidyverse")
library("magrittr")
library("mgcv") # for GAMMs
library("broom") # for elegant handling of model outputs
#devtools::install_github("git@github.com:tfjaeger/BCS511minilab1.git")
#library("BCS511minilab1")
load("~/Desktop/BehavioralScience/BCS511minilab1/data/spikes.rda")
theme_set(theme_bw())
# load data from library
#data(spikes)
d = spikes
# adding Stimulus variable
spikes %<>%
mutate(
Stimulus = BinocularR * DisparityFar
)
# subsetting 20 cells
#c = c("c101","c103","c104","c105","c106","c107","c110","c111","c112","c114","c115","c116","c117","c118","c119","c120","c121","c123","c125","c126")
d = spikes %>%
#mutate(Choice = as.character(as.numeric(ID))) %>%
filter(BinocularR != 0,
ID %in% sample(levels(.$ID), 20))
#subset(Cell %in% c)
#plotting raw data altogether
p = d %>%
ggplot(aes(x = Stimulus, y = Spikes, color = as.factor(Choice))) +
geom_point(size = 0.1, position = position_jitter(height = 0.01)) +
facet_wrap(~ID, ncol = 5) +
geom_smooth(method = glm) +
scale_x_continuous("Stimulus") +
scale_color_discrete("Choice",
breaks = c("-1", "1"),
labels = c("near", "far")) +
scale_y_continuous("Spikes")
#scale_linetype_discrete(
# breaks = c("m1", "m2"),
#labels = paste("Monkey", c(1,2))) +
#coord_cartesian(c(-0.1,0.1))
p
m1 = lm(Spikes ~ Stimulus + Choice, data = d)
summary(m1)
##
## Call:
## lm(formula = Spikes ~ Stimulus + Choice, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.219 -26.660 -5.824 17.635 154.739
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.14950 0.40281 119.535 < 2e-16 ***
## Stimulus -0.07230 0.01488 -4.857 1.21e-06 ***
## Choice 0.87101 0.46482 1.874 0.061 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.48 on 7754 degrees of freedom
## Multiple R-squared: 0.003085, Adjusted R-squared: 0.002828
## F-statistic: 12 on 2 and 7754 DF, p-value: 6.276e-06
Using the same LM, calculate the partial correlation between Choice and Spikes for all unique combination of Animal, Cell, and Run. This is the measure Zaidel et al propose as a correction of the traditional approach (choice probabilities). Pointers:
d.models =
d %>%
group_by(ID) %>%
do(
model = glm(Spikes ~ 1 + Stimulus + Choice, family = gaussian(identity), data = .)
)
d.models %>%
droplevels() %>%
glance(model)
d.models %>%
droplevels() %>%
# Set parametric to FALSE if you want the smooth terms instead (for gams only)
tidy(model, parametric = T)
d.models %>%
droplevels() %>%
augment(model)
In all cases, the p-value for Choice are below 0.05.
d.models3 = d %>%
group_by(ID) %>%
nest()
print(d.models3)
## # A tibble: 20 x 2
## # Groups: ID [99]
## ID data
## <fct> <list<df[,14]>>
## 1 m1c105r5 [468 × 14]
## 2 m1c107r7 [480 × 14]
## 3 m1c119r6 [480 × 14]
## 4 m1c125r5 [240 × 14]
## 5 m1c154r7 [366 × 14]
## 6 m1c162r6 [240 × 14]
## 7 m1c167r7 [299 × 14]
## 8 m1c185r6 [453 × 14]
## 9 m1c194r8 [614 × 14]
## 10 m1c195r5 [240 × 14]
## 11 m1c204r5 [240 × 14]
## 12 m2c106r8 [387 × 14]
## 13 m2c131r7 [480 × 14]
## 14 m2c135r5 [240 × 14]
## 15 m2c144r6 [200 × 14]
## 16 m2c145r8 [480 × 14]
## 17 m2c147r7 [480 × 14]
## 18 m2c151r7 [480 × 14]
## 19 m2c74r6 [480 × 14]
## 20 m2c82r7 [410 × 14]
d.models3 %<>%
mutate(
model = map(data, function(x) glm(Spikes ~ 1 + Stimulus + Choice, data = x)),
coefs = map(model, tidy),
goodness = map(model, glance),
prediction = map(model, augment)
)
print(d.models3)
## # A tibble: 20 x 6
## # Groups: ID [99]
## ID data model coefs goodness prediction
## <fct> <list<df[,14]>> <list> <list> <list> <list>
## 1 m1c105r5 [468 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [468 × 10]>
## 2 m1c107r7 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 3 m1c119r6 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 4 m1c125r5 [240 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [240 × 10]>
## 5 m1c154r7 [366 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [366 × 10]>
## 6 m1c162r6 [240 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [240 × 10]>
## 7 m1c167r7 [299 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [299 × 10]>
## 8 m1c185r6 [453 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [453 × 10]>
## 9 m1c194r8 [614 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [614 × 10]>
## 10 m1c195r5 [240 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [240 × 10]>
## 11 m1c204r5 [240 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [240 × 10]>
## 12 m2c106r8 [387 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [387 × 10]>
## 13 m2c131r7 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 14 m2c135r5 [240 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [240 × 10]>
## 15 m2c144r6 [200 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [200 × 10]>
## 16 m2c145r8 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 17 m2c147r7 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 18 m2c151r7 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 19 m2c74r6 [480 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [480 × 10]>
## 20 m2c82r7 [410 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [410 × 10]>
d.models3 %>%
unnest(prediction, .drop = T)
## Warning: The `.drop` argument of `unnest()` is deprecated as of tidyr 1.0.0.
## All list-columns are now preserved.
## This warning is displayed once per session.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
BONUS:
Calculate Choice Probabilities (CPs) for all unique combination of Animal, Cell, and Run. Compare them to the partial correlations obtained under point 3. Pointers:
Do linear fits to the data (using only additive effects of Stimulus and Choice on Spikes) generally miss something about the data, compared fits that allow non-linear effects of Stimulus? Pointers:
Do your findings with regard to question 2.1 mean that the estimates of the effect of Choice are biased in the linear model? Pointers:
Even if linear fits do not on average lead to biased estimates, the might be misleading for some cells. Identify the 5-10 cells for which the inferred choice effect differs most strongly depending on whether you make the linearity assumption (use an LM) or not (use a GAM).
BONUS:
LMs assume a fixed linear or some other parametric form of the relationship between the dependent variable and the covariates while GAMs do not assume a priori any specific form of this relationship, and can be used to reveal and estimate non-linear effects of the covariate on the dependent variable.
So based on this explanation I would recommend - - Use GAMs when the data structure seems additive i.e., if additions of x results in prediction of y or when the data does not have any linear shape - Use LMs when linearity is assumed with your data
``` ## Ouick overview of broom One efficient way to apply models to many cells is through dplyr’s general purpose verb do(), which underlies the more commonly used verbs like summarise(), mutate(), etc.:
d.models =
spikes %>%
group_by(ID) %>%
do(
model = glm(Spikes ~ 1 + Stimulus + Choice, family = gaussian(identity), data = .)
)
Broom makes it easy to extract information from R models, such as lm(), glm(), and gam() fits. For example, glance() allows you to extract goodness of fit measures from each of these different types of models; tidy() extracts the coefficients, standard errors, statistics, and p-values into a data.frame; augment() gets the case-by-case predictions from a model and attaches them to a data.frame. Make sure to read the helpfiles, e.g., help(augment.lm).
d.models %>%
glance(model)
d.models %>%
# Set parametric to FALSE if you want the smooth terms instead (for gams only)
tidy(model, parametric = T)
d.models %>%
augment(model)
Another way to what we achieved above is to nest all the grouped data (essentiall making a tibble within the tibble) and then use purrr’s awesome map() function to map the list of nested tibbles (the column called data) into the regression model as well as all its different summaries—simply via mutate():
d.models = spikes %>%
group_by(ID) %>%
nest()
print(d.models)
## # A tibble: 99 x 2
## # Groups: ID [99]
## ID data
## <fct> <list<df[,14]>>
## 1 m1c105r5 [546 × 14]
## 2 m1c107r7 [560 × 14]
## 3 m1c111r5 [560 × 14]
## 4 m1c112r4 [280 × 14]
## 5 m1c115r6 [240 × 14]
## 6 m1c116r4 [280 × 14]
## 7 m1c118r5 [372 × 14]
## 8 m1c119r6 [560 × 14]
## 9 m1c121r6 [560 × 14]
## 10 m1c125r5 [280 × 14]
## # … with 89 more rows
d.models %<>%
mutate(
model = map(data, function(x) glm(Spikes ~ 1 + Stimulus + Choice, data = x)),
coefs = map(model, tidy),
goodness = map(model, glance),
prediction = map(model, augment)
)
print(d.models)
## # A tibble: 99 x 6
## # Groups: ID [99]
## ID data model coefs goodness prediction
## <fct> <list<df[,14]>> <list> <list> <list> <list>
## 1 m1c105r5 [546 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [546 × 10]>
## 2 m1c107r7 [560 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [560 × 10]>
## 3 m1c111r5 [560 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [560 × 10]>
## 4 m1c112r4 [280 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [280 × 10]>
## 5 m1c115r6 [240 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [240 × 10]>
## 6 m1c116r4 [280 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [280 × 10]>
## 7 m1c118r5 [372 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [372 × 10]>
## 8 m1c119r6 [560 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [560 × 10]>
## 9 m1c121r6 [560 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [560 × 10]>
## 10 m1c125r5 [280 × 14] <glm> <tibble [3 × 5]> <tibble [1 × 7]> <tibble [280 × 10]>
## # … with 89 more rows
This tibble now contains the data and models for all cells, along with the coefficients (and their SEs, statistics, p-values), goodness of fit measures (AIC, BIC, etc.), and predictions. We can look at any of these by unnesting that part. E.g.:
d.models %>%
unnest(prediction, .drop = T)
This is very powerful. For example, it allows us to extract the coefficients and their CIs across many cells, which might come in handy for your lab.
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.1 (2019-07-05)
## os macOS Mojave 10.14.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/New_York
## date 2019-10-23
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
## broom * 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
## callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
## cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
## devtools 2.2.0 2019-09-07 [1] CRAN (R 3.6.0)
## digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
## dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
## DT 0.8 2019-08-07 [1] CRAN (R 3.6.0)
## ellipsis 0.2.0.1 2019-07-02 [1] CRAN (R 3.6.0)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
## forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
## fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
## generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
## ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
## glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
## haven 2.1.1 2019-07-04 [1] CRAN (R 3.6.0)
## hms 0.5.1 2019-08-23 [1] CRAN (R 3.6.0)
## htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
## htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.6.0)
## httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
## jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0)
## knitr 1.24 2019-08-08 [1] CRAN (R 3.6.0)
## labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0)
## lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.1)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
## lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
## lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
## magrittr * 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.1)
## memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
## mgcv * 1.8-28 2019-03-21 [1] CRAN (R 3.6.1)
## modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
## nlme * 3.1-141 2019-08-01 [1] CRAN (R 3.6.0)
## pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
## pkgbuild 1.0.5 2019-08-26 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
## pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
## prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
## processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
## ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
## purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
## R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
## Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
## readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
## remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
## rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
## rmarkdown 1.15 2019-08-21 [1] CRAN (R 3.6.0)
## rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
## rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
## rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.0)
## scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
## tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
## tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.0)
## tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
## tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0)
## usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
## utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
## vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
## xfun 0.9 2019-08-21 [1] CRAN (R 3.6.0)
## xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
## yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
## zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
##
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library